The AUTONOMATA Spoken Names Corpus
نویسندگان
چکیده
In the Autonomata project we have collected a corpus of spoken name utterances with manually corrected phonemic transcriptions of these utterances. The corpus was designed with the intention to become a major resource for the development of automatic speech recognition engines that can achieve a high accuracy on the recognition of person and geographical names spoken in Dutch. The recorded names were selected so as to reveal the major pronunciation variations that a speech recognizer of e.g. a navigation system with speech input is going to be confronted with. This includes native speakers speaking foreign names and vice versa.
منابع مشابه
Evaluating Repetitions, or how to Improve your Multilingual ASR System by doing Nothing
Repetition is a common concept in human communication. This paper investigates possible benefits of repetition for automatic speech recognition under controlled conditions. Testing is performed on the newly created Autonomata TOO speech corpus, consisting of multilingual names for Points-Of-Interest as spoken by both native and non-native speakers. During corpus recording, ASR was being perform...
متن کاملA split lexicon approach for improved recognition of spoken names
Recognition of spoken names is a challenging task for automatic speech recognition systems because the list of names for applications such as directory assistance tends to be in the order of several hundred thousands. This makes spoken name recognition a very high perplexity task. In this paper we propose the use of syllables as the acoustic unit for spoken name recognition based on reverse loo...
متن کاملLanguage Models for Name Recognition in Spanish Spoken Dialogue Systems
Current advances on dialogue system require the development of language models for automatic speech recognition that are not only domain or task specific but also sub-task specific (e.g. name, age or price recognition). This paper presents a method for the creation of language models for name recognition at the greeting stage of a conversation in spoken Spanish. In particular, we focus on the i...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملA telephone speech database of spelled and spoken names
This report describes a telephone speech corpus collected at the Oregon Graduate Institute's Center for Spoken Language Understanding. Over four thousand people called in response to public requests. They were prompted by a recorded voice to say and spell their rst and last names|with and without pauses, to say what city they grew up in and what city they were calling from, and to answer two ye...
متن کامل